Constituent Reordering and Syntax Models for English-to-Japanese Statistical Machine Translation
نویسندگان
چکیده
We present a constituent parsing-based reordering technique that improves the performance of the state-of-the-art English-to-Japanese phrase translation system that includes distortion models by 4.76 BLEU points. The phrase translation model with reordering applied at the pre-processing stage outperforms a syntax-based translation system that incorporates a phrase translation model, a hierarchical phrase-based translation model and a tree-to-string grammar. We also show that combining constituent reordering and the syntax model improves the translation quality by additional 0.84 BLEU points.
منابع مشابه
Post-ordering by Parsing for Japanese-English Statistical Machine Translation
Reordering is a difficult task in translating between widely different languages such as Japanese and English. We employ the postordering framework proposed by (Sudoh et al., 2011b) for Japanese to English translation and improve upon the reordering method. The existing post-ordering method reorders a sequence of target language words in a source language word order via SMT, while our method re...
متن کاملDependency Tree Abstraction for Long-Distance Reordering in Statistical Machine Translation
Word reordering is a crucial technique in statistical machine translation in which syntactic information plays an important role. Synchronous context-free grammar has typically been used for this purpose with various modifications for adding flexibilities to its synchronized tree generation. We permit further flexibilities in the synchronous context-free grammar in order to translate between la...
متن کاملJapanese-to-English Patent Translation System based on Domain-adapted Word Segmentation and Post-ordering
This paper presents a Japanese-to-English statistical machine translation system specialized for patent translation. Patents are practically useful technical documents, but their translation needs different efforts from general-purpose translation. There are two important problems in the Japanese-to-English patent translation: long distance reordering and lexical translation of many domain-spec...
متن کاملPOS-based Reordering Models for Statistical Machine Translation
We present a novel word reordering model for phrase-based statistical machine translation suited to cope with long-span word movements. In particular, reordering of nouns, verbs and adjectives is modeled by taking into account target-to-source word alignments and the distances between source as well as target words. The proposed model was applied as a set of additional feature functions to re-s...
متن کاملSyntax-based reordering for statistical machine translation
In this paper, we develop an approach called syntax-based reordering (SBR) to handling the fundamental problem of ord ordering for statistical machine translation (SMT). We propose to alleviate the word order challenge including morphoyntactical and statistical information in the context of a pre-translation reordering framework aimed at capturing shortand ong-distance word distortion dependenc...
متن کامل